Speech enhancement for bandlimited speech

نویسندگان

  • David A. Heide
  • George S. Kang
چکیده

Throughout the history of telecommunication, speech has rarely been transmitted with its full analog bandwidth (0 to 8 kHz or more) due to limitations in channel bandwidth. This impaired legacy continues with tactical voice communication. The passband of a voice terminal is typically 0 to 4 kHz. Hence, high-frequency speech components (4 to 8 kHz) are removed prior to transmission. As a result, speech intelligibility suffers, particularly for low-data-rate vocoders. In this paper, we describe our speech-processing technique, which permits some of the upperband speech components to be translated into the passband of the vocoder. According to our test results, speech intelligibility is improved by as much as three to four points even for the recently developed and excellent Department of Defense-standard Mixed Excitation Linear Predictor (MELP) 2.4 kb/s vocoder. Note that speech intelligibility is improved without expanding the transmission bandwidth or compromising interoperability with others. INTRODUCTION In analog voice transmission, speech bandwidth is limited by the channel bandwidth. In switched public telephone circuits or high-frequency radio channels, speech bandwidth is typically less than 4 kHz. With digital speech transmission, speech bandwidth still remains less than 4 kHz because the wider the speech bandwidth, the higher the data rate required for transmission. The 4 kHz speech bandwidth is standard in virtually all digital voice terminals used by the government and industry. A speech bandwidth of 4 kHz is acceptable for the vowels spoken by a majority of speakers. The same cannot be said for consonants, particularly fricatives (/s/, /sh/, /ch/, etc.), because their spectra extends above 4 kHz (see Fig. 1). Good reproduction of consonants is vital to a proper understanding of speech because they provide important cues for sentence segmentation. To improve narrowband speech, we developed a technique to shift the fricative spectrum down below 4 kHz so that some of the fricative sounds will be heard over digital voice terminals. Our earlier effort to spread fricative spectra was by making use of the aliasing effect (i.e., spectral folding around 4 kHz) [1]. This technique is not effective if there is no fricative spectrum in the 4 to 5 kHz region. Thus, we developed a new and more effective technique, which is capable of spreading fricative spectrum even if it is present only above 5 kHz.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Enhancement Through an Optimized Subspace Division Technique

The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...

متن کامل

A Novel Frequency Domain Linearly Constrained Minimum Variance Filter for Speech Enhancement

A reliable speech enhancement method is important for speech applications as a pre-processing step to improve their overall performance. In this paper, we propose a novel frequency domain method for single channel speech enhancement. Conventional frequency domain methods usually neglect the correlation between neighboring time-frequency components of the signals. In the proposed method, we take...

متن کامل

Speech Enhancement Through an Optimized Subspace Division Technique

The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...

متن کامل

Speech enhancement using STC-based bandwidth extension

Telephone speech is typically bandlimited to 4 kHz, resulting in a ‘muffled’ quality. Coding speech with bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlation between the 0-4 kHz and 4-8 kHz speech bands to resynthesize wideband speech from narrowband spee...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

A New Shuffled Sub-swarm Particle Swarm Optimization Algorithm for Speech Enhancement

In this paper, we propose a novel algorithm to enhance the noisy speech in the framework of dual-channel speech enhancement. The new method is a hybrid optimization algorithm, which employs the  combination of  the  conventional θ-PSO and the shuffled sub-swarms particle optimization (SSPSO) technique. It is known that the θ-PSO algorithm has better optimization performance than standard PSO al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998